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Abstract 

We prove the following conjecture of Narayana: there are no nontrivial dominance refine¬ 
ments of the Smirnov two-sample test if and only if the two sample sizes are relatively prime. 
We also count the number of natural significance levels of the Smirnov two-sample test in 
terms of the sample sizes and relate this to the Narayana conjecture. In particular, Smirnov 
tests with relatively prime sample sizes turn out to have many more natural significance 
levels than do Smirnov tests whose sample sizes are not relatively prime (for example, equal 
sample sizes). 

Keywords Smirnov two-sample test, dominance refinement, Gnedenko path. 

AMS classification 62G10, 05A15 


1 Introduction 

Let Xi,, Xm and U,..., V„ be independent random samples from continuous distribution 
functions F and G, respectively. In order to test nonparametrically whether Xi is stochastically 
smaller than Yi, one often uses the Smirnov statistic 

= sup iFm(t) - Gn{t)), (1) 

t 

where Fm and G„ are the empirical distribution functions of Vi,..., Xm and Yi,..., Ri respec¬ 
tively. 
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grant CHRX-CT93-0400, and the PRC Maths-Info. 
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ti — 2 
^2 = 2 
ts = 4 


Figure 1: Representation of a Path 


Narayana (1971, pp. 43 ff.) constructs explicit dominance refinements of (upper-tailed) 
Smirnov two-sample tests with equal sample sizes and calculates their power against Lehmann 
alternatives G = F^{k >0). In the ranges considered by Narayana {3 < m = n < 10), he shows 
that the powers of these dominance refinements are uniformly greater than the Smirnov test. It 
is thus of practical importance to know when dominance refinements exist. 

Narayana (1975) stated without proof that dominance refinements of the Smirnov two-sample 
test exist if and only if gcd(TO, n) > 1. This claim was restated as a conjecture in Narayana (1979, 
Exercise 9, p. 30). The purpose of this paper is to prove this conjecture. We will show that this 
conjecture is closely related to the number of natural significance levels of the Smirnov two-sample 
test. 


2 Dominance refinements 


In this section, we explain what dominance refinements are. 

A convenient way to study the distribution of is the so-called Gnedenko path. The 
Gnedenko path to of the samples Xi,..., Xm and Yi,..., is defined as follows: w is a path 
from (0, 0) to (m, n) with unit steps LOi to the east or north. If the ith value of the ordered 
combined sample comes from Xi,. .. ,Xm, then uii is a step east; otherwise, it is a step north. 
Since we assume that F and G are continuous, the probability of a tie {i.e., Xi = Yj) is zero. 
Hence, w is almost surely well-defined. It is easy to see that under Hq '■ F = G, all paths from 
(0,0) to {m,n) are equiprobable, i.e., P(ri;) = 1/(’")|’") for all paths w. Now, 

mnD^^ < r 

if and only if all vertices {x, y) of the Gnedenko path satisfy 

nx — my > r 

In other words, mnD^^ > r if and only if w crosses below the line nx—my = r. A convenient way 
to describe a path uj is to represent it by a n-tuple (<i,..., tn), where ti is the horizontal distance 
from (to, n — i) to w (see Figure 1). The path (si,..., s„) is said to dominate (G,..., t„) if 
Si > ti ior i = 1 ,... ,n. 

Let the \x~\ be the ceiling of x, i.e., the smallest integer larger than or equal to x and 
let \x\ be the floor of x, i.e., the largest integer not exceeding x. The r-proflle is the path 



2 DOMINANCE REFINEMENTS 


2 


(max(0, ti),..., max(0, t„)), where U is the ceiling of the horizontal distance from {x,m — i) 
to the line nx — my = r). Clearly, the r-profile is the minimal path that lies above (possibly 
touching) the line nx — my = r. 

Thus, we may cast the (upper-tailed) Smirnov two-sample test completely in terms of Gne¬ 
denko paths as follows: mnD^^ < r if and only if the Gnedenko path dominates the r-profile. 
Thus, the Smirnov two-sample test is completely characterized by its r-profiles {i.e., we regard 
the test as a set of critical regions, indexed by its natural significance levels, cf. Gibbons (1992, 
p. 23)). This formulation shows that we attain more significance levels if we can insert interme¬ 
diate paths between consecutive r-profiles of the Smirnov two-sample test (see Narayana (1979, 
Chapter 2)). A set of paths totally ordered by dominance is said to be a dominance refine¬ 
ment of any set of paths included in it. Note that under this definition, we consider a test to 
be a dominance refinement of itself, called the trivial dominance refinement. A set of paths 
is saturated if it has no nontrivial dominance refinement. 

Of course, there exist other refinements of the Smirnov test. Each partition of the set of 
paths with a common value r of the statistic (*-e., all paths that touch but do not cross 

the line nx — my = r) yields a refinement of the Smirnov test. For example, we can divide 
the paths that touch but do not cross the line nx — my = r according to the number of times 
that they touch the line nx — my = r. Dominance refinements partition the set of paths with a 
common value of into dominance regions, i.e., collections of paths that dominate a given 
path. An advantage of dominance refinements is that they can be described very efficiently by 
simply listing the critical paths. Hence, the refined test can be performed graphically. 

Another reason for considering dominance refinements (or the notion of dominance itself) 
is the following relation with most powerful rank (MPR) tests. If F and G have densities / 
and g respectively, and the likelihood ratio f/g is increasing (as is the case for the Lehmann 
alternatives Ha : G = , k > 0), then s dominates t implies P(t|G = E^) > P(s|G = E^) (see 

Savage (1956)). Hence, if a path s belongs to the critical region of an MPR test, then all paths 
dominated by s must also belong to this critical region. Thus, an MPR test at a fixed significance 
level is a dominance test in the terminology of Narayana (1979, Chapter 3, p. 35). Conversely, 
dominance tests are good approximations for MPR tests (see Narayana (1979, Chapter 3, pp. 
44-45)). 



r = 0 
r = 2 
r = A 
r = 6 


Figure 2: m = A and n = 2 


r — 0 
r — 1 
r = 2 
r = 3 
r — 4 
r — b 
r — 6 
r = 7 



r = 12 


Figure 3: m = 5 and n = 3. 



Let us look at two examples in order to get a feeling for the Narayana conjecture. 

• (Figure 2, m = 4, n = 2) The 0-profile and the 1-profile coincide and are equal to the path 
(2,4) whereas the 2-profile is the path (1,3). Thus, we see that there are two intermediate 
paths between the 1-profile and the 2-profile: (1,4) and (2,3). Inserting either of these 
paths, we obtain a refinement of the Smirnov test. Note that the 2-profile differs from the 
1-profile by the possibility to go through the points (1, 0) and (3,1), which both lie on the 
line 2x — 4y = 2. 
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• (Figure 3 , m = 5, n = 3) The 0-profile is the path (2,4,5), and the 1-profile is the path 
(2, 3, 5). Thus, there is no intermediate path between the profiles; this is also true for the 
other pairs of consecutive profiles. In other words, there is no refinement. Note that there 
is only one lattice point on each line of the form 3x — 5y = r and that no prohles coincide. 

3 Main Results 

The examples above indicate that the existence of dominance refinements depends on the number 
of lattice points on lines of the form nx — my = r. In the following lemmas, we enumerate these 
points. These lemmas are used in our proof of the Narayana conjecture. 

For unexplained notions of number theory and combinatorics, we refer the reader to Stark 
(1970, Chapters 2 and 3) and Berge (1971). 

Lemma 1 Let m, n, and r he positive integers, and let d = gcd(m, n). 

1 . The Diophantine equation 

nx — my = r (2) 

has integer solutions if and only if d divides r. 

2 . If {x,y) is a solution of (Q), then the integer solutions of # are exactly 

+ + :tex}. (3) 

Proof. 1 . If r = d, then by Euclid’s Lemma there exist integer solutions (x, y) of nx — my = r. 
Obviously, this also holds if r is a multiple of d. Conversely, if there exists an integer solution 
(x, y) of nx — my = r, then r is a multiple of d, since d divides both m and n. 

2 . If t, X, and y are integers such that nx — my = r, then x' ■.= x + tm/d and y' ■.= y + tn/d 
satisfy nx' — my' = r. Conversely, if nx — my = r and nx' — my' = r, then subtraction yields 
n{x — x') = m{y' — y). Cancelling the common factor d and using the uniqueness of prime 
factorizations, we see that there exists an integer t such that x — x' = tm/d and y' — y = tn/d. 
□ 


Lemma 2 Let m and n be positive integers with greatest common divisor d and m > n. Let r 
he a nonnegative number. The Diophantine equation 

nx — my = r (0 < x < m and 0 <y <n) (4) 


has integer solutions only if d divides r and 0 < r < nm. In that case, the number of solutions 
to the Diophantine equation w is given by 


where 


and 


ar '.= d + 1 — 


p + a 
m/d 


r/d 


r\ 

n/d 


.n\ 


a = —— {'f^/d) ^ mod m/d. 


Here, (n/d) ^ denotes the inverse ofn/d in the ring = {0,1,..., m/d — 1}. 


( 5 ) 
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Proof. The first assertion is an immediate consequence of Lemma Q and the fact that (m, 0) 
lies on the line nx — my = nm. 

Now, define m' = m/d, n' = n/d, and r' = r/d. By the remainder theorem, there exist 
nonnegative integers p and q such that r' = n'p+q with 0 < q < n'. Since m' and n' are relatively 
prime, it follows from Lemma 1 that there exist integers a (0 < a < m') and b {0 < b < n') such 
that q = an' — bm'. Thus, r' = n'{p + a) — m'b. Hence, {p + a, b) is a solution to i)- We claim 
that (p + o, b) is the solution {x, y) {x > 0) of with minimal nonnegative y. This claim follows 
from the above, since the next smaller solution of (H) has y' = b — n' < 0. Thus, the solutions of 
(^) with y >0 are exactly {(p + a + tm', b + tn')}t>o- 

Now, impose x < m on this solution set. This leads to t < (m — p — a)/m'. Therefore, the 
solutions to (^) are {(p + a + tm', b + tn') '■ 0 < t <T} where T = = d — 

Note that if p + a > m, then T = — 1 and there are no solutions at all. 

The value ofp follows from n'p = r' — q {0 < q < n'). It follows from r' = n'(p + a) — m'b that 
n'a = r' — n'p mod m'. Since m' and n' are relatively prime, n' has an inverse in X^/. Thus, 
a = {r' — n'p) {n')~^ mod m'. Equivality follows since a < n' < m'. □ 
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Examples. 

• (Figure 2 , m = 4, n = 2) If r = 1, then there are no solutions, since d does not divide r. 

If r = 2, then p = [1/2J = 0 and n/d = 1, hence {n/d)~^ = 1 mod 2 and a = 0. Thus, the 
number of solutions equals 2 + 1 — = 2 + 1 — 1 = 2. 

• (Figure 3 , to = 5, n = 3) If r = 1, then p = [1/3J = 0 and {n/d)~^ = 3“^ = 2 mod 5, 

since 6 = 1 mod 5. Thus, a = 2 and the number of solutions equals 1 + 1 — = 

1 + 1 — 0 = 1. If r = 14, then p = [14/3J = 4, a = 2, and the number of solutions equals 
1 + 1- [6/5] = 1 + 1-2 = 0. 


Lemma 3 Let m and n be positive integers with greatest common divisor d. Let Sk be the 
number of positive divisors r of d such that Ur = k (in Lemma |^, in other words, such that the 
Diophantine equation ^ has precisely k integer solutions. Then 

1. Sfc = 0 for k > d, 

2. Sk = nm/df for 0 < k < d, 

3. So = {nm — (n + m)d + df )/2df, and 

4 . Sd = {nm + {n + m)d — df )/2d‘^. 


Proof. We use the notation of Lemma 2 and its proof. Without loss of generality, m > n. 

1. This follows from the second part of Lemma 1. 

2. By Lemma 2, nx — my = r has k {0 < k < d) solutions with the constraints 0 < a; < to 
and 0 < 2 / < n if f^^] = d + 1 — k. In other words, if to — m'k < p + a < m — m'{k — 1). 
Each r' corresponds uniquely to a pair (p, q) with either 0 < q < n' and 0<p<moTq = 0 
and p = m. First choose q, which can be done in n! ways. This fixes a. Hence, p must obey 
0 < TO — m'k — a < p < m — m'(k — 1) — a < to — a < to, of which there are exactly m' solutions. 

3. By Lemma 1, we must have r > nm — nm/d, or equivalently r' > nm/d — nm/d^. Each 
line n'x — m'y = r' has 0 or 1 integer solutions with 0 < x < m and 0 < y < n. Now, consider 
all lattice points {x, y) with nx — my > nm — nm/d, x < m, and y > 0, i.e., the triangle spanned 
by the points (to — to', 0), (to, 0), and (to, n') with the points (to — to', 0) and (to, n') excluded. 
It is easy to see that this triangle contains A := (to' + l)(n' + 1) — 2)/2 points. By the second 
part of Lemma 1, each such point corresponds uniquely to a line nx — my = r with exactly one 
lattice point in the region 0 < x < m and 0 < y < n. Hence, the other nm/df — A lines contain 
no point. 

4. This follows by elimination, since there are nm/d values of r under consideration. □ 

Define the Bell-Stirling number to be the number of ordered partitions of {1, 2, ..., n} 
into disjoint nonempty sets S'! U • ■ • U 5^ = {1, 2,... , n}. For example, Bq = 1, B( = 1, B 2 = 3, 

= 13, = 75, Hg = 539. The Bell-Stirling number can also be defined implicity by its 

generating function (Motzkin (1971, p. 171)) 


J2B</e/nl = 

n—0 


1 

2 — exp(t) 


Now, we prove the Narayana conjecture by counting dominance refinements and saturated 
dominance refinements. 
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Theorem. Let m and n be positive integers with greatest common divisor d. 

1 . The Smirnov upper-tailed two-sample test with sample sizes m and n has exactly {d^ + 
nm{2d — 1) + d(ji + m))/(2d^) natural levels. 

2 . All saturated dominance refinements of the Smirnov upper-tailed two-sample test with sam¬ 
ple sizes m and n have exactly ((n + l)(m + 1) — {d + l))/2 + 1 natural levels. 

3. (Narayana’s Conjecture) The Smirnov upper-tailed two-sample test is saturated if and 
only if the sample sizes are relatively prime. 

4 . The number of dominance refinements of the Smirnov upper-tailed two-sample test with 
samples size m and n (including the trival one) is given by the product 

d-l 

^j^o^{nra+{n+ra)d—<T)/(2d? n 

5 . The number of saturated dominance refinements of the Smirnov upper-tailed two-sample 
test with sample sizes m and n is given by the product 


d-l 

^\inrn-^{n-^rn)d—d?)/i2d?) j^\nm/d^ 
k=l 


Proof. 1. We use the representation of the Smirnov test as a set of r-prohles. As noted above, the 
r-profile is different from the (r —l)-profile exactly when the Diophantine equation has at least 
one solution {x,y). The number of levels thus corresponds to the number of r such that ar > 0 
in Lemma ^ (including r = 0). We are thus led to sum 1 + si + S 2 + • ■ • + Sd in Lemma which 
equals 1 + (d — l)nm/d^ + {nm + (n + m)d — d^)/{2df) = {d^ + nm{2d — 1) + d{n + m))/{2df). 

2. Let {Po,Pi,... ,Pk) be a saturated set of paths. The region strictly between two con¬ 
secutive paths Pi and P^+i is exactly a unit square {{x,y) | (xi — 1 < Xiandyi < yi -\- 1} as in 
Figure 2, otherwise additional paths could be inserted between Pi and Pi+i. All points {x,y) 
with 0 < x < m, 0 < y < n and nx — my > 0 determine exactly one such unit square, since they 
lie under the 0-prohle. There are ((n -|- 1 )(to -I- 1) — (d -I- l))/2 such points, since there are d -|- 1 
lattice points on nx — my = 0. Thus, the path is of length (n -I- 1 )(to -|- 1) — (d -I- l))/2 — 1 and 
consists of (n -|- l)(m -I- 1) — (d -I- l))/2 natural significance levels. 

3. If m and n are relatively prime, then substitution of d = 1 into parts 1) and 2) yields that 
the Smirnov upper-tailed two-sample test has as many natural signihcance levels as its saturated 
dominated refinement. Hence, the Smirnov test is saturated. Conversely, if m and n are not 
relatively prime, then by Lemma there is at least one line nx — my = r with two lattice points. 
Hence, the Smirnov upper-tailed two-sample test admits a non-trivial dominance refinement. 

4. Consider an r-profile P with Ur = k, and the (r — l)-profile P'. The area between P and 
P' consists of k unit squares as in Figure 3. New profiles can be “inserted” between P and P' by 
passing the path across the k unit squares in several steps instead of all at once. Let U • • • U S'fe 
be an ordered partition of the k unit squares. Define the intermediary path Pi to pass under the 
cells U ■ • ■ U over the cells 5^+1 U • ■ • U S'*,, and follow paths P and P' where they agree. 
Each such ordered partition determines a dominance refinement of this step in the list of paths 
which make up the Smirnov test. There are ordered partitions. 

To consider the combinations of refinements of each step of the Smirnov test, we take the 
product over all r. 
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m\n 

3 

4 

5 

6 

7 

8 

9 

10 

3 

4 

10 

12 

7 

16 

18 

10 

22 

4 

10 

5 

15 

12 

20 

9 

25 

19 

5 

12 

15 

6 

21 

24 

27 

30 

11 

6 

7 

12 

21 

7 

28 

22 

18 

27 

7 

16 

20 

24 

28 

8 

36 

40 

44 

8 

18 

9 

27 

22 

36 

9 

45 

35 

9 

10 

25 

30 

18 

40 

45 

10 

55 

10 

22 

19 

11 

27 

44 

35 

55 

11 


Table 1: Number of natural significance levels of the upper-tailed Smirov two-sample test. 


m\n 

3 

4 

5 

6 

7 

8 

9 

10 

3 

7 

10 

12 

13 

16 

18 

19 

22 

4 

10 

11 

15 

17 

20 

21 

25 

27 

5 

12 

15 

16 

21 

24 

27 

30 

31 

6 

13 

17 

21 

22 

28 

31 

34 

38 

7 

16 

20 

24 

28 

29 

36 

40 

44 

8 

18 

21 

27 

31 

36 

37 

45 

49 

9 

19 

25 

30 

34 

40 

45 

46 

55 

10 

22 

27 

31 

38 

44 

49 

55 

56 


Table 2: Number of natural significance levels of saturated dominance refinements of the upper¬ 
tailed Smirov two-sample test. 


5. The reasoning is similar to that in part 4 except we must only consider ordered partitions 
into unit blocks. There are fc! such ordered partitions of a A: element set. □ 

Table ^ shows the number of natural significance levels for the upper-tailed Smirnov two- 
sample test for various sample sizes. Note especially the low number of levels for Smirnov tests 
with equal sample sizes {e.g., compare the number of levels for m = n = 10 with those for m = 10 
and n = 9). Table ^ shows the number of natural significance levels for saturated dominance 
refinements of the upper-tailed Smirnov two-sample test for various sample sizes. Note that 
unlike in Table there are no big differences in Table ||. 

If we can list the profiles of an upper-tailed Smirnov two-sample test, then using Kreweras’ 
theorem (Narayana (1971, p. 21) we can also calculate the levels rather than the total number 
of levels. For example, in the case of equal sample sizes m = n, the profiles are the paths 
(1, 2,..., m), (0,1, 2,..., m — 1),..., (0,0,..., 0). In order to calculate the levels associated to 
these profiles, we must calculate the number of paths dominated by these profiles. According 
to (a special case of) Kreweras’ theorem, this amounts to calculating the determinants of the 
matrices 

I tm-j+l + 1 \ 

V1+J-* y+ 

where (^)_|_ is defined as (^) for y > z > 0 and 0 otherwise. E.g., for m = to = 10, we easily 
calculate that the natural significance levels are 5.41 x 10“®, 1.08 x 10“®, 2.71 x 10“®, 7.58 x 10“®, 
2.27 X 10-4, 7.14 X 10-4, 0.00232, 0.00774, 0.0263, 0.0909, and 0.318. 

Remark. In our theorem, we only considered the upper-tailed Smirnov test based on = 
sup( — Gn{t)). Of course, similar results exist for the Smirnov tests based on = 

SUP( {Gnit) - Emit)) or Dmn = SUP( \Fm{t) - G„(t)|. 
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